August 15, 2014
-17$ million loss
Is it possible to leverage screenplay information to predict movie profitability and assist the descision-making process for screenplay selection?
Some features that were tried but failed to produce convincing results: readability index, sentiment analysis, tf-idf, tf-idf with POS tagging.
word2vec - Efficient Estimation of Word Representations in Vector Space (published by Google).
Allows to cluster words with similar meaning. These clusters can be used as features in a predictive model.
\[ \hat{y} = x_{budget} + \sum_{i=1}^{n} x_{i, word2vec} \]
Because we only use screenplay information, the main challenge is the identification, extraction and selection of features that may be available in movie scripts.
Some features that were tried but failed to produce convincing results: readability index, sentiment analysis, tf-idf, tf-idf with POS tagging.
word2vec - Efficient Estimation of Word Representations in Vector Space (published by Google).
Allows to cluster words with similar meaning. These clusters can be used as features in a predictive model.
rmarkdown::render("MillionDollarStory_Presentation.Rmd")